Search CORE

29 research outputs found

On Static Timing Analysis of GPU Kernels

Author: Hirvisalo Vesa
Publication venue: OASIcs - OpenAccess Series in Informatics. 14th International Workshop on Worst-Case Execution Time Analysis
Publication date: 01/01/2014
Field of study

We study static timing analysis of programs running on GPU accelerators. Such programs follow a data parallel programming model that allows massive parallelism on manycore processors. Data parallel programming and GPUs as accelerators have received wide use during the recent years. The timing analysis of programs running on single core machines is well known and applied also in practice. However for multicore and manycore machines, timing analysis presents a significant but yet not properly solved problem. In this paper, we present static timing analysis of GPU kernels based on a method that we call abstract CTA simulation. Cooperative Thread Arrays (CTA) are the basic execution structure that GPU devices use in their operation that proceeds in thread groups called warps. Abstract CTA simulation is based on static analysis of thread divergence in warps and their abstract scheduling

Dagstuhl Research Online Publication Server

Using static program analysis to compile fast cache simulators

Author: Hirvisalo Vesa
Publication venue: Teknillinen korkeakoulu
Publication date: 26/03/2004
Field of study

This thesis presents a generic approach towards compiling fast execution-driven simulators, and applies this to cache simulation of programs. The resulting cache simulation method reduces the time needed for cache performance evaluations without losing the accuracy of the results. Fast cache simulators are needed in the performance analysis of software systems. To properly understand the cache behavior caused by a program, simulations must be performed with a sufficient number of inputs. Traditional simulation of memory operations of a program can be orders of magnitude slower than the execution of the program. This leads to simulation times that are often infeasible in software development. The approach of this thesis is based on using static cache analysis to guide partial evaluation and slicing of simulators. Because of redundancy in memory access patterns of typical programs, an execution-driven cache simulator program can be partially evaluated during its compilation. Program slicing can be used to remove the computations that have no effect on the simulation result. The static cache analysis presented in this thesis is generic. The analysis is designed especially for programs that use dynamic addressing. The thesis assumes an address analysis that gives the cache analysis static information about cache aliases and cache conflicts between accessed memory lines. To determine the memory references that always cause cache hits or cache misses, the thesis describes both must and may analyses of cache states. The cache state analysis is built by using abstract interpretation. Based on the use of abstract interpretation, the soundness of the analysis is proved. The potential performance of the method was experimentally evaluated. The thesis describes both a tool set implementing the cache analysis method and experiments done with the tool set. The experiments indicate that a simple implementation is capable of significantly speeding up the simulations.reviewe

CiteSeerX

Aaltodoc Publication Archive

Transitive closure algorithm MEMTC and its performance analysis

Author: Hirvisalo Vesa
Nuutila Esko
Soisalon-Soininen Eljas
Publication venue: Elsevier Science B.V.
Publication date: 01/06/2001
Field of study

AbstractIn this paper, we present a new algorithm for computing the full transitive closure designed for operation in layered memories. The algorithm is based on strongly connected component detection and on a very compact representation of data. We analyze the average-case performance of the algorithm experimentally in an environment where two layers of memory of different speed are used. In our analysis, we use trace-based simulation of memory operations

Elsevier - Publisher Connector

Experience in Performance Analysis of Large Real-Time Systems

Author: Vesa Hirvisalo
Publication venue
Publication date: 01/01/1998
Field of study

In this paper, we discuss the experience we gained during three performance engineering projects that we did in co-operation with telecommunication industry. In each of the projects, we analyzed the performance of a real-time system, which has large embedded software. Each system was at a different stage of development. We used three different modeling techniques: Queueing networks, execution graphs, and message sequence charts. We also applied different approaches of building models. To give guidelines for analyzing large systems, we discuss our method of analysis, rationale for choices we made, and the lessons we learned. 1. Introduction In this paper, we discuss performance analysis of large realtime systems. The discussion is based on three case studies; each of which were performance analyses of telecommunication systems. The performance analyses of the systems were done in close co-operation with Finnish telecommunication industry. All the analyzed systems were embedded telecomm..

CiteSeerX

Crossref

Transitiivisen sulkeuman laskeminen tietokantaympäristössä

Author: Hirvisalo Vesa
Publication venue
Publication date: 01/01/1994
Field of study

Aaltodoc Publication Archive

Ohjelmajäljen käännösaikainen tiivistäminen muistisimulointia varten

Author: Hirvisalo Vesa
Publication venue
Publication date: 01/01/1998
Field of study

Aaltodoc Publication Archive

DBE: A Tool for Trace Driven Memory Simulation

Author: Vesa Hirvisalo
Publication venue
Publication date
Field of study

DBE is an experimental tool designed for trace driven simulation of processor caches and disk buffers. Trace driven simulation is flexible and requires no special hardware, but generating traces can be too slow and the resulting traces too large to handle. To overcome this problem, DBE uses a compile-time trace compaction, which can yield smaller traces and faster run times

CiteSeerX